Lithuanian Continuous Speech Corpus Lrn 1: an Improvement

نویسندگان

  • Sigita Laurinčiukaitė
  • Mark Filipovič
  • Laimutis Telksnys
چکیده

This paper presents the development of Lithuanian continuous speech corpus LRN 1 (Lithuanian Radio News, version 1). The corpus was developed from speech corpus LRN 0.1 by increasing the duration of speech corpus (it lasts 20 hours 50 minutes). The major improvement of speech corpus LRN 1 was a development of time-aligned word level annotations of speech signals. Time-aligned word level annotations of speech signals were obtained after a two-stage process: automatic realignment of acoustic models of phonemes and subsequent manual correction of annotations. The improvement of the corpus is useful for constructing and evaluating speaker-independent continuous speech recognition systems and for linguistic research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lithuanian Continuous Speech Corpus Lrn 0.1: Design and Potential Applications

This paper presents design, development and contents of Lithuanian continuous speech corpus LRN 0.1 (Lithuanian Radio News, prototype-version 0.1). The corpus contains 17 hours 23 minutes of records from radio broadcast news read by 31 speakers. The recorded material is segmented into sentence-length records that are divided into training, development, and evaluation sets. Speech recordings are...

متن کامل

Towards Acoustic Modeling of Lithuanian Speech

In this paper we present experimental investigation of using various phone sets for acoustic modeling of Lithuanian speech applied to large vocabulary continuous speech recognition. Paper presents specifics of Lithuanian speech acoustics including accentuation, diphthongs, softening and assimilation of consonants. The speech recognition experiments use only acoustic model since effective langua...

متن کامل

Corpus-Based Hidden Markov Modelling of the Fundamental Frequency of Lithuanian

This paper presents the corpus-driven approach in building the computational model of fundamental frequency, or F0, for Lithuanian language. The model was obtained by training the HMM-based speech synthesis system HTS on six hours of speech coming from multiple speakers. Several gender specific models, using different parameters and different contextual factors, were investigated. The models we...

متن کامل

From speech corpus to intonation corpus: clustering phrase pitch contours of Lithuanian

This paper presents our research in preparation to compile a Lithuanian intonation corpus. The main objective of this research was to discover characteristic patterns of Lithuanian intonation through clustering of pitch contours of intermediate intonation phrases. The paper covers the set of procedures that were used to extend an ordinary speech corpus to make it suitable for intonation analysi...

متن کامل

Framework for Choosing a Set of Syllables and Phonemes for Lithuanian Speech Recognition

This paper describes a framework for making up a set of syllables and phonemes that subsequently is used in the creation of acoustic models for continuous speech recognition of Lithuanian. The target is to discover a set of syllables and phonemes that is of utmost importance in speech recognition. This framework includes operations with lexicon, and transcriptions of records. To facilitate this...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009